This report explores a dataset of prosper loan data. This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information. In this report, I selected 23 variables and created a few new variables through exploratory analysis.
========================================================
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ CreditGrade : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
## $ DateCreditPulled : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
## 'data.frame': 113937 obs. of 25 variables:
## $ Term : Factor w/ 3 levels "12","36","60": 2 2 2 2 2 3 2 2 2 2 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ListingCategory..numeric.: int 0 2 0 16 2 1 1 2 7 7 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ ProsperRating : Factor w/ 7 levels "AA","A","B","C",..: NA 2 NA 2 5 3 6 4 1 1 ...
## $ ListingCategory : Factor w/ 21 levels "0 - Not Available",..: 1 3 1 17 3 2 2 3 8 8 ...
Several sharp line on whole number, such as 10000, 15000 and 20000. It is reasonable because people tend to borrow whole numbers of amount. Next, let’s take a look at the statistic of loan amount.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The minimum loan is 1000, with the median of 6500 and mean of 8337, the maximum loan requested is 35000.
The most common loan amount.
## [1] 4000
Interesting to note that 4000 is the most common amount people borrowed, followed by 10000 and 15000.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
Monthly loan payments exhibits positive skew. The minimum monthly payment is 0, with the median of 217.7 and mean of 272.5, the maximum monthly payment is 2251.5.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
Monthly income exhibits positive skew. The minimum value is 0, with the median of 4667 and mean of 5608, the maximum monthly payment is 1750003.
A debt income ratio is the percentage of a consumer’s monthly gross income that goes toward paying debts.
Removing the upper quantile on the data we got:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
DI ratio exhibits slight positive skew. The maximum is 10.01. The data is capped at 10.01, debt-to-income ratio larger then 1000% will be returned as 1001%. The minimum value is 0. With the median of 0.22 and mean of 0.276.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
Credit ranges are normally distributed with mean/med = 685/680. Note: Only CreditScoreRangeLower is used. CreditScoreRangeHigher has very similar distribution but is 20 points higher.
I created a new variable TotalMonthlyDebt, to estimate the monthly debt at inception of loan. (TotalMonthlyDebt = DebtToIncomeRatio * StatedMonthlyIncome)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 637.5 1061.5 1224.9 1622.4 171004.2 8554
The resulting distribution is positively skewed.
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
There are 67322 employed borrowers or about 59% of the total borrowers.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 26.00 67.00 96.07 137.00 755.00 7625
Employment duration exhibits positive skew with mean 96 and median 67. Most people are employed shorter than 1 year.
## # A tibble: 21 x 3
## ListingCategory..numeric. DollarTotal CountTotal
## <int> <int> <int>
## 1 1 577736197 58308
## 2 0 106096621 16965
## 3 3 64175191 7189
## 4 7 62035775 10494
## 5 2 60148466 7433
## 6 6 12861665 2572
## 7 4 10913226 2395
## 8 13 10549832 1996
## 9 15 9929936 1522
## 10 14 7684187 876
## # ... with 11 more rows
Most people borrow to consolidate their debts, in total there are 58308 case or about 51.17%. Interestingly it seems there are a lot of loans not classified. More on this in the multivarate section.
## # A tibble: 10 x 3
## LoanYear DollarTotal CountTotal
## <fct> <int> <int>
## 1 2014 144995536 12172
## 2 2013 362170278 34345
## 3 2012 153175116 19553
## 4 2011 75138013 11228
## 5 2010 26940486 5652
## 6 2009 8914396 2047
## 7 2008 69561850 11552
## 8 2007 80787786 11460
## 9 2006 28132199 5906
## 10 2005 78687 22
Here the loan origination dates are decomposed using lubridate and plotted by month and year. It is very interesting that the loans by year plot and table shows a drop off in 2009.
Revolving Credit Balance is the total outstanding balance that the borrower owes on open credit cards or other revolving credit accounts.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 3121 8549 17599 19521 1435667 7604
The median is 8549 and mean 17600. The maximum value is 1436000. The minimum and the most common amount is 0.
Bankcard utilization is the sum of the balances owed on open bankcards divided by the sum of the card’s credit limits. Lower usually means better.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.310 0.600 0.561 0.840 5.950 7604
There is a peak in the plot near 100% which means a lot of borrowers who have almost 100% Bankcard utilization. There are some borrowers who have utilization > 1.00 (100%).
Number of borrowers with BankcardUtilization < 0.05:
## [1] 9361
Number of borrowers near 1:
## [1] 9532
Number of borrowers with BankcardUtilization >= 1:
## [1] 2574
There are 2574 borrowers who has bankcard utilization > 1. That means they owed more then the credit limit.
##
## Cancelled Chargedoff Completed
## 5 11992 38074
## Current Defaulted FinalPaymentInProgress
## 56576 5018 205
## Past Due (>120 days) Past Due (1-15 days) Past Due (16-30 days)
## 16 806 265
## Past Due (31-60 days) Past Due (61-90 days) Past Due (91-120 days)
## 363 313 304
Most loans in the whole data set are completed and current. Later I will break it down by year.
## [1] "# of Loans by Term (months)"
## 12 36 60
## 1614 87778 24545
Most people have 3 year loans.
Most people have 3 years loan. 5 years loan starts to show up from 2011, and the count of 5 years loan continue to increase until 2014
Most defaults occurred in 2006-2008.
The loans that were labelled ‘Past Due’ peaked and plateaued from 2012 to 2013.
## [1] "Is the borrower a homeowner? "
## False True
## 56459 57478
Borrowers are mostly employed and about half are homeowners.
There are 113,937 loans in the dataset, 23 main features out of 81 were selected in this report and listed below:
The main features for the loans are loan amount, term and borrowerRate. The main features for borrowers are credit score, debt to income ratio, monthlyLoanPayment and income.
Loan status and category will help to support a detailed picture that which type of loans are current, defaulted, delinquent etc.
I created a new variable TotalMonthlyDebt, to estimate the monthly debt at inception of loan, by multiplying the DI ratio by the stated monthly income.
To be better investigate the dataset with yearly and seasonality analysis. I broke out the loan origination date in LoanYear, LoanMonth using lubridate.
All of the money related variables (LoanOriginalAmoun, RevolvingCreditBalance and monthlyPayment etc.) are positively skewed. I do not transform the data for univariate analysis.
First, let’s have a look at the relationship between LoanOriginalAmount with ListingCategory.
The amount of Personal loan and student use are the lowerst two. The baby and adoption loans looks similar to debt consolodation. Interesting to note that wedding loans is quite high. Let’s take a look at the numbers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 4000 7500 8836 13000 35000
The median and mean are 7500 and 8836, with maximum value up to 35000.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2500 4000 4873 6000 25000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 9000 9794 15000 35000
It is not surprised that employed borrowers are requesting loan higher then not employed borrowers The median and the mean of not employed borrower are 4000 and 4873 vs 9000 and 9794 from employed borrower.
Most of the Loan are below 10000 and monthly income is under 100000. The quantile shows that the higher the income the higher the median of the loan original amount.
The number of the data that have original amount < 10000 and annual income less then 100000 is:
## [1] 65553
which is around 57.53% of the data.
Interestingly people who borrow > 25000 has annual income of > 100000, it looks like there is some kind of rule, that if you borrow > 25000 the the minimal annual income is 100000.
Let’s verify this a bit.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 100000 115000 137250 160179 175000 800000
And yes the minimum annual income is 100000 when the loanOrigianlAmount > 25000. Next, let’s check the correlation between the 2 variables.
## [1] "Correlation between Annual income and loan amount:"
## [1] 0.2012595
## [1] "Correlation between Annual income < 100000 and loan amount < 10000:"
## [1] 0.1982141
This indicate weak positive relationship between this two variables
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 820
## [1] "Relation between loan amount with credit score: "
## [1] 0.3408745
Slight positive correlation. The higher credit score, the larger the loan the borrower can get.
That is not too informative. Let’s check the correlation between these 2 variables.
## [1] 0.06291678
The correlation is not significant, it show no or neglible relationship.
The plots shows the higher your credit score the lower is your borrower rate.
Let’s take a look at the correlation.
## [1] -0.4615667
This indicate moderate negative relationship between this two variables
Let’s see the relationship of the borrower rate with other variables.
As BankcardUtilization, DeliquenciesLast7Years and PublicRecordsLast10Years increases so is the borrower rate. On the other hand the lower the RevolvingCreditBalance the lower the BorrowerRate.
Let’s check the correlation.
Correlation between BorrowerRate with BankcardUtilization:
## [1] 0.255482
Correlation between BorrowerRate with RevolvingCreditBalance:
## [1] -0.05960823
Correlation between BorrowerRate with AnnualIncome:
## [1] -0.0889818
BankcardUtilization has a weak positive relationship. The other factor has a negligible relationship.
## [1] "Correlation between BorrowerRate with AnnualIncome: "
## [1] 0.4733034
This shows a moderate positive correlation. You have more debt as you increase the # of credit lines or vice-versa.
## [1] "Correlation between loanOriginalAmount with borrowerRate: "
## [1] -0.3289599
Moderate negative correlation between loan amount and borrower rate. Seems counterintuitive, lower interest rate on larger loans.
## [1] "Correlation between StateMonthlyIncome with TotalMonthlyDebt: "
## [1] 0.3604392
Monthly income and total monthly debt has a positive relationship of R^2 of .36.
The homeowners have a higher dispersion whereas the non-homeowners are more concentrated under $5000 income.
Taking a closer look at spike in the histogram of BorrowerRate, these high interest loans were made mainly for debt consolidation in 2007-2010.
## Warning: Removed 100728 rows containing non-finite values (stat_bin).
The credit score ranges are shown, the higher the borrower rate (cost), the worse the credit rating.
Mean loans amounts drops in 2009 and increased in 2013.
There is a increased average loan amounts and variance during the holiday seasons and a pull back during the spring and summer months.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2000 3500 4694 5000 25000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3000 5000 7276 10000 35000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 8000 11500 12370 15000 35000
And entry with term 60 has median of 11500, 5000 for 36 and 3500 for 12.
In this part, I wanted to see how several features affect the borrower rate and loan amount, and the relationships between some other features.
For instance:
-The borrower rate increases as debt to income ratio increases. Moreover, The higher credit score, the lower is your borrower rate.
-The term seems to be related with loan original amount, the bigger the amount the longer the term.
-The higher credit score, the larger the loan the borrower can get.
There’s a negative relationship between interest rates and loan amount, the larger the loan, the lower the rate on average. That was mostly due to them having higher credit scores.
People who are employed on average take out larger loans than unemployed people.
Average loan amounts and variance are seasonal (higher during holidays)
There’s a strong positive relationship between open credit lines and total monthly debt
Monthly income and monthly debt have a positive relationship.
No.
The 2 strongest relationship I found was between credit score and borrower rate, and between open credit lines and total monthly debt. This makes sense since higher credit scores represent better credit decisions and can make creditors more confident that you will repay your future debts as agreed. Credit scores should be directly related to the cost of borrowing (borrower rate).
The plot above shows the distribution of borrower rate with credit score and loan amount. We see that borrowers with higher credit score tend to have lower borrower rate and higher loan amount.only credit scores from 660 (1st quartile) and above are selected for better visual presentation.
Payment to income ratio was calculated by dividing MonthlyLoanPayment by StatedMonthlyIncome This plot shows what the payment to income ratio would be by year and seperated by IncomeRange. It seems most of the borrowers with lower income ranges take on larger loans. Most of the borrowers seem to have payment/income ratio close to 0.025-0.075%. The $100k+ income range have noticeably lower payment/income ratio. Moreover, all income ranges have an increased payment/income ratio from 2009 to 2014.
Comparing DI ratio, most of the borrowers seem to have DI ratio close to 20-30%, which are much lower than the payment/income ratio (2.5 -7.5%). This result indiate most of the borrowers didn’t pay off the debt everymonth. The $100k + income range have noticeably lower DI ratio at around 15-20%.
It looks like the uptick in loans amount in 2009-2013 grew primarily in income range of $50-75k, $75-100k and $100k+.
When looking at the loan amounts vs borrower rate, the credit scores demarcated borrowers by credit worthiness.
I looked at payment/income ratio and DI ratio and observed the higher the income, the lower the percentage of debt. Additionally, it appears the platform after a few years of maturity is beginning to lend to slight more indebted borrowers in the income ranges of $50-75k, $75-100k and $100k+. Looking at the differentce between payment/income ratio and DI ratio, it seemed like most of the borrowers didn’t pay off their debt each month.
loan categories were bucketed by year and clearly it looks like pre-2010 they did not capture the loan category and debt consolidation dominated the cateogory every year subsequently.
This graph shows the distribution of loan amounts by year seperated by credit score. It shows a substantial rebound in loan amounts from 2010-2013. Borrowers with larger loan amount tend to with higher credit scores. I subsetted only credit scores from 660 (1st quartile) and above for better visual presentation
The borrowers with high credit scores are in red on the left. They generally have lower interest rates and larger loan amounts. I subsetted only credit scores from 660 (1st quartile) and above for better visual presentation.
The Prosper data has a lot of variables, for this project I limited the number of variables to investigate. The difficulties I had with the data mainly from understanding the variables and then selecting the appropriate ones to analyze. After much thought, I use the variables about the finatial information and identity of borrowers and loans. I do this because I assume these are the metric that is important for lender to look at before actually lending money.
This report shows Prosper struggled initially with their first few years of loans (2006-2009). However, their business grew very quickly by the SEC in 2009. Their main loan categories are in debt consolidation and I believe the low interest rate environment has helped the supply of lenders looking for yield. This was a great learning experience about the P2P lending model.
There are a number of different ways to take this project further. Firstly, I’ve focused on a small subset of the variables available in the dataset to explore the investors side a bit more; look at investor profit and losses and their general activity in the peer-to-peer lending industry. Also, I’d like to incorporate some machine learning algorithm such as logistic regression and cluster into my exploration to predict the number of ongoing loans that will end up being defaulted. I believe that an accurate model will be very beneficial to Prosper.